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Abstract 

A random matrix is likely to be well conditioned, and motivated by this well known property 
we employ random matrix multipliers to advance some fundamental matrix computations. This 
includes numerical stabilization of Gaussian elimination with no pivoting as well as block Gaus- 
sian elimination, approximation of the leading and trailing singular spaces of an ill conditioned 
matrix, associated with its largest and smallest singular values, respectively, and approximation 
of this matrix by low-rank matrices, with further extensions to Tensor Train approximation and 
the computation of the numerical rank of a matrix. We formally support the efficiency of the 
proposed techniques where we employ Gaussian random multipliers, but our extensive tests have 
consistently produced the same outcome where instead we used sparse and structured random 
multipliers, defined by much fewer random parameters compared to the number of their entries. 

2000 Math. Subject Classification: 15A52, 15A12, 15A06, 65F22, 65F05 

Key Words: Random matrices, Random multipliers, GENP, Low-rank approximation. Numerical 
rank 



1 Introduction 

It is well known that A random matrix is likely to be well conditioned |D88| , |E88j , |ES05) , |CD05j , 
|SST06| . |Bllj . and motivated by this well known property we apply randomized matrix multi- 
plication to advance some fundamental matrix computations. We stabilize numerically Gaussian 
elimination with no pivoting as well as block Gaussian elimination, approximate leading and trail- 
ing singular spaces of an ill conditioned matrix A, associated with its largest and smallest singular 

*Some results of this paper have been presented at the ACM-SIGSAM International Symposium on Symbolic and 
Algebraic Computation (ISSAC '2011), San Jose, CA, 2011, the 3nd International Conference on Matrix Methods in 
Mathematics and Applications (MMMA 2011) in Moscow, Russia, June 22-25, 2011, the 7th International Congress 
on Industrial and Applied Mathematics (ICIAM 2011), in Vancouver, British Columbia, Canada, July 18-22, 2011, 
the SIAM International Conference on Linear Algebra, in Valencia, Spain, June 18-22, 2012, and the Conference on 
Structured Linear and Multilinear Algebra Problems (SLA2012), in Leuvcn, Belgium, September 10-14, 2012 
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values, respectively, approximate this matrix by low-rank matrices, and compute Tensor Train ap- 
proximation, the numerical rank of a matrix, and an approximation of a matrix by a structured 
matrix lying nearby. Our numerical tests are in good accordance with our formal study, except that 
in the tests all algorithms have fully preserved their power even where we dramatically decreased 
the number of random parameters involved by using sparse and structured multipliers. 

1.1 Numerically safe Gaussian elimination with no pivoting 

Hereafter "flop" stands for "arithmetic operation" , by saying "expect" and "likely" we mean "with 
probability 1 or close to 1", o'j{A) denotes the jth largest singular value of an n x n matrix A, and 
the ratio k{A) — ai{A) /ap{A) for p = rank(A) denotes its condition number. k{A) = \ \A\\ \\A~^\\ 
if p = n, that is if A is a nonsingular matrix. If this number is large in context, then the matrix 
A is ill conditioned, otherwise well conditioned. For matrix inversion and solving linear systems of 
equations the condition number represents the output magnification of input errors, 

1 1 OUTPUT ERRORII 
""^^^ IIINPUT ERRORII ' ^ ' 

and backward error analysis implies similar magnification of rounding errors |GL96] . |H02] . |S98| . 

To avoid dealing with singular or ill conditioned matrices in Gaussian elimination, one incorpo- 
rates pivoting, that is row or column interchange. Gaussian elimination with no pivoting (hereafter 
we refer to it as GENP) can easily fail in numerical computations with rounding errors, except for 
the cases where the input matrices are strongly well conditioned, that is where all their leading 
principal square blocks are nonsingular and well conditioned. In particular diagonally dominant as 
well as positive definite well conditioned matrices have this property. For such matrices, GENP 
outperforms Gaussian elimination with pivoting GL96, page 119]. Random matrices are likely to be 
strongly well conditioned, but we do not solve random linear systems of equations. We can, however 
randomize linear systems by applying random multipliers and then can apply GENP. We proposed 
and tested this approach in |PGMQ[ Section 12.2] and |PQZa| , and our tests consistently showed 
its efficiency even where we used just circulant or Householder multipliers filled with integers ±1 
and where we limited randomization to the choice of the signs ± (see our Table 17.11 and | PQZa[ 
Table 2]). Our Corollary |4T] supports these empirical observations provided that the multipliers are 
square Gaussian random matrices. Estimation of the condition numbers of structured matrices was 
stated as a challenge in |SST06) . The paper |PQa| presents some initial advance, but the problem 
remains largely open. 

1.2 Randomized low-rank approximation and beyond 

Our Corollary UTJ supporting randomized GENP, relies on the probabilistic estimates for the ranks 
and condition numbers of the products k,(GA) and k{AH) in terms of k{A) where G and H are 
Gaussian random matrices (see Theorem 14. ip . We also apply the same estimates to support ran- 
domized algorithms for the approximation of the leading singular spaces of an ill conditioned matrix 
A associated with its largest singular values. This can be immediately extended to the approxima- 
tion of a matrix having a small numerical rank by low-rank matrices. The algorithm is numerially 
safe, runs at a low computational cost, and has a great number of highly important applications 
to matrix computations |HMTll] . We point out its further extensions to the approximation of a 
matrix by a structured matrix lying nearby and to computing Tensor Train approximation and the 
numerical rank of a matrix. Then again our formal support of these algorithms relies on using Gaus- 
sian random multipliers, but our tests show that random Toeplitz multipliers are as effective. This 
suggests formal and experimantal study of various other random structured and sparse multipliers 
that depend on smaller numbers of random parameters. Note the recent success of Tropp [Tllj in 
this direction. 
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1.3 Related work 



Preconditioning of linear systems of equations is a classical subject [A94) . |B02) . |G97| . Randomized 
multiplicative preconditioning for numerical stabilization of GENP was proposed in |PGMQ[ Section 
12.2] and |PQZa| , but with no formal support for this approach. On low-rank approximation we refer 
the reader to the survey |HMT11) . We cite these and other related works throughout the paper and 
refer to |PQZb[ Section 11] on further bibliography. For a natural extension of our present work, 
one can combine randomized matrix multiplication with randomized augmentation and additive 
preprocessing of |PGMQ| , jPIMRlO) . [PQlOl , [PQT2] , |PQZC| , |PQZb| , [PY09] . 

1.4 Organization of the paper and selective reading 

In the next section we recall some definitions and basic results. We estimate the condition numbers 
of Gaussian random matrices in Section |3] and of randomized matrix products in Section 21 where 
we also comment on numerical stabilization of GENP by means of randomized multilication. In 
Sections [S] and [B] we apply randomized matrix multiplication to approximate the leading and trailing 
singular spaces of a matrix having a small numerical rank, approximate this matrix by a low-rank 
matrix, and point out applications to tensor decomposition and to approximation by structured 
matrices. In Section [7] we cover numerical tests, which constitute the contribution of the second 
author. In Appendix A we estimate the probability that a random matrix has full rank under the 
uniform probability distribution. In Appendix B we compute the numerical rank of a matrix by 
using randomization but neither pivoting nor orthogonaliztaion. 

2 Some definitions and basic results 

We assume computations in the field M of real numbers. 

Hereafter "flop" stands for "arithmetic operation" ; "expect" and "likely" mean "with probability 
1 or close to 1" (we do not use the concept of the expected value), and the concepts "large" , "small" , 
"near" , "closely approximate" , "ill conditioned" and "well conditioned" are quantified in the context. 
Next we recall and extend some customary definitions of matrix computations [GL96] . [S98j . 

2.1 Some basic definitions on matrix computations 

jgmxn -j-j^g class of real m x n matrices A — {ai,j)^j^. 

(Bi I ... \ Bk) = {Bj)j^i is a 1 X fc block matrix with blocks Bi, . . . ,Bk. diag(_Bi, . . . ,Bk) — 
diag(i?j)*^^2 is a fc X fc block diagonal matrix with diagonal blocks Bi, . . . , Bk- 

Gi is the ith coordinate vector of dimension n for i = 1, . . . , n. These vectors define the identity 
matrix J„ — (ei | . . . | e„) of size n x n. Ok^i is the fc x I matrix filled with zeros. We write / and 

where the size of a matrix is not important or is defined by context. 

is the transpose of a matrix A. 

2.2 Range, rank, and generic rank profile 

Ti-iA) denotes the range of an m x n matrix A, that is the linear space {z : z — Ax.} generated by 
its columns. rank(A) = dmiTZ{A) denotes its rank. A^j^"^ denotes the leading, that is northwestern 
k X k block submatrix of a matrix A. A matrix of a rank p has generic rank profile if all its leading 

1 X i blocks are nonsingular for i = 1, . . . , p. If such matrix is nonsingular itself, then it is called 
strongly nonsingular. 

Fact 2.1. The set M of m x n matrices of rank p is an algebraic variety of dimension (m + n — p)p. 
Proof. Let Af be an m x n matrix of a rank p with a nonsingular leading p x p block A/qo and 

write M = ( ^^J'^ ] • Then the (to — p) x in — p) Schur complement Mn — Mn^MZr}' Mqi must 

\MiQ Mil J 

vanish, which imposes (m — p){n — p) algebraic equations on the entries of M. Similar argument 
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can be applied where any p x p submatrix of the matrix M (among ( / i j ^^'"'^ submatrices) is 
nonsingular. Therefore dimM = mn — (to — p){n — p) = (to + n — p)p. □ 

2.3 Orthogonal, Toeplitz and circulant matrices 

A real matrix Q is called orthogonal if Q^Q = I or QQ^ — I. In Section[7]we write Q{A) to denote 
a unique orthogonal matrix specified by the following result. 

Fact 2.2. fGL96[ Theorem 5.2.2]. QR factorization A ~ QR of a matrix A having full column rank 
into the product of an orthogonal matrix Q = Q{A) and an upper triangular matrix R = R{A) is 
unique provided that the factor R is a square matrix with positive diagonal entries. 

A Toeplitz m X n matrix Tm.n — {ti-j)Tj=i defined by its first row (i-/i)/J=o subvector 
{th)2=i of its first column vector. Circulant matrices are the subclass of Toeplitz matrices where 
tg = th if \g-h\^ n. 

Theorem 2.1. 0{{m + n) log(m + n)) flops suffice to multiply anmx n Toeplitz matrix by a vector. 

2.4 Norms, SVD, generalized inverse, and singular spaces 



\\A\\h is the ft,-norm and ||A||f = \jY^Tj=i l^ijP is the Frobenius norm of a matrix A = (aij)™'"i- 
We write \ \A\ \ — \ \A\\2 and ||v|| — V v^v — ||v||2 and recall from }GL96[ Section 2.3.2 and Corollary 



max™^'"Jaij| < P|| = \\A^\\ < max"^;2i|aij|, 



2.3.2] that 

^PIIi<ll^ll<V^PIIi, Piii^p^iu, p|p<p||ipiu, (2.1) 

\\A\\<\\A\\f<V^\\A\\, (2.2) 
< for l,2,oo and any matrix product AB. (2.3) 

Define an SVD or full SVD of an to x n matrix A of a rank p as follows, 

A - Sa^aT^- (2.4) 

Here SaS'^ = S'^Sa = Im, TaTJ = TJTa = /„, = diag(i;A, 0,„_p,„_p), Ea = diag(crj(A))^^;^, 
cTj = crj{A) = ajlA^) is the jth largest singular value of a matrix A for j — I, . . . , p, and we write 
cTj = for j > p. These values have the minimax property 

a-j ~ max min ll^x||, i — l,...,p, (2.5) 

^ dim(S)=i xes, ||x||=i " " ' 'f^' y J 

where S denotes linear spaces |GL96[ Theorem 8.6.1]. Consequently dp > 0, ai = max||x||=i ||^x|| = 
ll^ll- 

Fact 2.3. If Aq is a submatrix of a matrix A, then crj[A) > (7j(ylo) for all j. 

Proof. |GL961 Corollary 8.6.3] implies the claimed bound where Aq is any block of columns of the 
matrix A. Transposition of a matrix and permutations of its rows and columns do not change 
singular values, and thus we can extend the bounds to all submatrices Aq. □ 



A'^ = Ta diag(E^^, On~p,m-p)S'j^ is the Moore-Penrose pseudo-inverse of the matrix A of 
and 

\\A+\\ = l/a,{A) (2.6) 

for a matrix A of a rank p. A+'^ stands for (A+)^ = (A^)^, and stands for (A^^)^ = (A^)~^ 
In Sections[5H6]we use the following definitions. For every integer k in the range 1 < fc < rank(A) 
define the partition Sa = iSk,A \ SA,m-k) and Ta — {Tk,A \ TA.n~k) where the submatrices Sk,A 
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and Tk^A are formed by the first k columns of the matrices Sa and Ta, respectively. Write T,k.A — 
diag(crj (A))*^^j, §k,A = Ti-iSk^A) and Tk,A = T^iTk^A)- If ffc > (Tfc+i, then ^k,A and T^^^ are the left 
and right leading singular spaces, respectively, associated with the k largest singular values of the 
matrix A, whereas their orthogonal complements E>A.m-k = Ti-iS A,m~k) and T^.„_fc = TZ{TA,n-k) 
are the left and right trailing singular spaces, respectively, associated with the other singular values 
of A. The pairs of subscripts {fc, A} versus {A,m — k} and {A,n — k} mark the leading versus 
trailing singular spaces. The left singular spaces of A are the right singular spaces of and vice 
versa. All matrix bases for the singular spaces Sk,A and T^.^ are given by matrices Sk,AX and 
Tfc^^y, respectively, for nonsingular k x k matrices X and Y. Orthogonal matrices X and Y define 
orthogonal matrix bases for these spaces. B is an approximate matrix basis for a space S within a 
relative error norm bound r if there exists a matrix E such that _B + _E is a matrix basis for this 
space S and if ||£'|| < T||i?||. 

2.5 Condition number, numerical rank and generic conditioning profile 

k{A) — = 1 1^1 1 11^^ 1 1 is the condition number of an m x n matrix A of a rank p. Such matrix 

is ill conditioned if cri(A) ^ o'piA) and is well conditioned otherwise. See [D83 , [GL96, Sections 
2.3.2, 2.3.3, 3.5.4, 12.5], [H02l Chapter 15], |KL94j . [S98l Section 5.3], on the estimation of matrix 
norms and condition numbers. 

An m X n matrix A has numerical rank, denoted nrank(A) and not exceeding rank(A), if the 
ratios f7j(A)/||A|| are small for j > nrank(A) but not for j < nrank(A). 

Remark 2.1. One can specify the adjective "small" above as "smaller than a fixed positive toler- 
ance". The choice of the tolerance can be a challenge, e.g., for the matrix diag(l.l~^)^?Lg. 

If a well conditioned m x n matrix A has a rank p < I — min{m, n}, then almost all its close 
neighbours have full rank I (see Section 13.21) , and all of them have numerical rank p. Conversely, 
suppose a matrix A has a positive numerical rank p = nrank(j4) and truncate its SVD by setting 
to all its singular values, except for the p largest ones. Then the resulting matrix A — E is well 
conditioned and has rank p and ||i?|| — iTp+i(A), and so A — E is a rank-p approximation to the 
matrix A within the error norm bound ap+i{A). At a lower computational cost we can obtain rank-p 
approximations of the matrix A from its rank- revealing factorizations GE96], |HP92) . |P00a] . and 
we further decrease the computational cost by applying randomized algorithms in Section [5] 

An mxn matrix has generic conditioning profile (cf. the end of Section [221) if it has a numerical 
rank p and if its leading i x i blocks are nonsingular and well conditioned for i — 1, . . . , p. If such 
matrix has full rank (that is if p = min{m,n}) and if it is well conditioned itself, then we call it 
strongly well conditioned. The following theorem shows that GENP and block Gaussian elimination 
applied to a strongly well conditioned matrix are numerically safe. 

Theorem 2.2. Cf. \PQZa\ Theorem 5.1]. Assume GENP or block Gaussian elimination applied to 
an n X n matrix A and write N = \\A\\ and = max"^]^ ||(Aj"'^)~"'^||. Then the absolute values of 
all pivot elements of GENP and the norms of all pivot blocks of block Gaussian elimination do not 
exceed N + N'^ , whereas the absolute values of the reciprocals of these elements and the norms of 
the inverses of the blocks do not exceed N- . 

3 Ranks and conditioning of Gaussian random matrices 
3.1 Random variables and Gaussian random matrices 

Definition 3.1. Fy{y) = Probability{'-/ < y} (for a real random variable ^) is the cumulative dis- 
tribution function (cdf) 0/7 evaluated at y. Fg(^^ ,j-^{y) = ^^^^ exp(— ^^2^^'' )dx for a Gaussian 
random variable g{p,a) with a mean p and a positive variance , and so 

p — \a<-y<p^^a with a probability near 1. (3.1) 
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Definition 3.2. A matrix (or a vector) is a Gaussian random matrix (or vector) with a mean fi 
and a positive variance if it is filled with independent identically distributed Gaussian random 
variables, all having the mean /i and variance . of such Gaussian random m x n 

matrices, which are standard for ^ — and — 1. By restricting this set to rax n Toeplitz matrices 
where only the m + n — 1 entries of the first row and column are independent we obtain the set of 
■j'my.n Q^ussian random Toeplitz matrices . Likewise we obtain the set -H^'^" o/ Gaussian random 
circulant matrices, where only n entries of the first row are independent. 

3.2 Nondegeneration of Gaussian random matrices 

The total degree of a multivariate monomial is the sum of its degrees in all its variables. The total 
degree of a polynomial is the maximal total degree of its monomials. 

Lemma 3.1. [DLlSl , JS8C^ . \Z79l . For a set IS. of a cardinality | A| m any fixed ring let a polynomial 
in m variables have a total degree d and let it not vanish identically on this set. Then the polynomial 
vanishes in at most d|A|'"~'^ points. 

We assume that Gaussian random variables range over infinite sets A, usually over the real line 
or its interval. Then the lemma implies that a nonzero polynomial vanishes with probability 0. 
Consequently a Gaussian random general, Toeplitz or circulant matrix has generic rank profile with 
probability 1 because the determinant of any its block is a polynomials in the entries. Likewise 
Gaussian random general, Toeplitz and circulant matrices have generic rank profile with probability 
1. Hereafter, wherever this causes no confusion, we assume by default that Gaussian random general, 
Toeplitz and circulant matrices have generic rank profile. This property can be readily extended to 
the products and various functions of general, sparse and structured Gaussian random matrices. 
Similar properties hold with probability near 1 where the random variables are sampled under the 
uniform probability distribution from a finite set of a large cardinality (see the Appendix). 

3.3 Extremal singular values of Gaussian random matrices 

Besides having full rank with probability 1, Gaussian random matrices in Definition 13.21 are likely 
to be wen conditioned [D88] . [E88] . [ES05] . |CD05j . [BIT] , and even the sum M + A for MeM™^" 
and A S G^^^ is likely to be well conditioned unless the ratio (t/||M|| is small or large |SST06j . 

The following theorem states an upper bound proportional to y on the cdf Fi/\\A+\\iy)j that is 
on the probability that the smallest positive singular value 1/||A+|| = ai{A) of a Gaussian random 
matrix A is less than a nonnegative scalar y (cf. (12.6^ ) and consequently on the probability that the 
norm ||A+|| exceeds a positive scalar x. The stated bound still holds if we replace the matrix A by 
A — B for any fixed matrix B, and for B = Om,n the bounds can be strengthened by a factor t/l™~"l 
[ES05] . [CDife) . 

Theorem 3.1. Suppose A e 5™^", B e M™^", I = min{m,n}, x > 0, and y > 0. Then 
Fai(A-B){y) < 2.35 Vly/a, that is Probability{\\{A - B)+\ \ > 2.35xVl/cr} < l/x. 

Proof. For m = n this is }SST06| Theorem 3.3]. Apply Fact 12. 31 to extend it to any pair {m, n}. □ 

The following two theorems supply lower bounds .^[[^[[(z) and Fi^(^A-^{y) on the probabilities that 
||A|| < z and k{A) < y for two scalars y and z, respectively, and a Gaussian random matrix A. We 
do not use the second theorem, but state it for the sake of completeness and only for square n x n 
matrices A. The theorems imply that the functions 1 — i^||^||(2:) and 1 — i^K(A)(2/) decay as z — > oo 
and y — cxi, respectively, and that the decays are exponential in — and proportional to \/\ogy/y, 
respectively. For small values ya and a fixed n the lower bound of Theorem 13.31 becomes negative, 
in which case the theorem becomes trivial. Unlike Theorem 13.11 in both theorems we assume that 
fi = 0. 

Theorem 3.2. JDSOll Theorem III]. Suppose A e ^"a h = max{m,ri} and z > 2ay/h. Then 
-l^\\A\\{z) > 1 — exp(— (z — 2(tV7i)^/(2ct^)), and so the norm \\A\\ is likely to have order a\fh. 
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Theorem 3.3. \SSTOa Theorem 3.1]. Suppose < cr < 1, y > 1, .4 € 5^^". Then the matrix A 
has full rank with probability 1 and F^(^j^-^{y) > 1 — (14.1 + 4.7-\/(2 In y) /n)n / [ycr) . 

Proof. See |SST06l the proof of Lemma 3.2]. □ 

4 Condition numbers of randomized matrix products and 
generic preconditioning 

Next we deduce probabilistic lower bounds on the smallest singular values of the products of fixed 
and random matrices. We begin with three lemmas. The first of them is obvious, the second easily 
follows from minimax property (|2.5p . 

Lemma 4.1. aj{SM) ~ <7j(MT) = aj(M) for all j if S and T are square orthogonal matrices. 

Lemma 4.2. Suppose E = diag(cr,) -Li, (Ji > (72 > ■ ■ ■ > <Jn, G £ M''^", H e K"^''. Then 
aj{GT,) > (Tj(G)(T„, aj(T,H) > aj{H)an for all j. If also (T„ > 0, then rank(G'I]) = rank(G'), 
rank(Ei/) = rank(i7). 

Lemma 4.3. ^'SST06, Proposition 2.2]. Suppose H € SJ^^"; SS'^ = S'^ S = TT^ = T^T = /„. 

Then SH e g]^]^-^ and HT G 

The following theorem implies that multiplication by standard Gaussian random matrix is un- 
likely to decrease the smallest positive singular value of a matrix dramatically, even though UV = O 
for some pairs of rectangular orthogonal matrices U and V . 

Theorem 4.1. Suppose G' G G'^.'^J^ , H' e GJ^^J , M e G = G' + U,H = H' + V for 

some matrices U and V, r{M) = rank(M), a; > and y > 0. Then i^i/||(GM)+|| (y) ^ P{yTM,a) 
and Fi/\\^MH)+\\{v) < F{y,M,a) for F{y,M,cr) = 2.35yVr\\M+\\/a andr^ min{r, r(Af)}, that is 
Probability{\\P+\\ > 2.35xV^\\M+\\/<7} < 1/x for P = GM and P = MH. 

Proof. With probability 1, the matrix MH has rank r because H e Q^^^J ■ So (cf. p.SI) ') 

^i/||(A.m)+||(y) ^ F^^(^MH)iy)- (4.1) 

Let M = Sm^mTIi be fuh SVD where Sm = diag(SM,0) = Sm diag(/^(Af), O) and Sm = 
diag((Tj(M))^fj*f^ is a nonsingular diagonal matrix. We have MH — Sm^mTJ^H, and so aj{MH) = 
aj{YjMTjjH) for all j by virtue of Lemma 14. 1[ because Sm is a square orthogonal matrix. Write 
Hr(M) = {Ir(M) I 0)TJ.^H and observe that aj{YiMTj^H) = aj{YjMHr{M)) and consequently 

a,{MH) = (Tj{tMHr(M)) for ah j. (4.2) 

Combine equation (|4.2p for j = r with Lemma [4.21 for the pair (S,iJ) replaced by (S^f , iJr(Af)) 
and obtain that ar{MH) > ar(M){M)ap{Hr(M)) = ap{Hr(M))/\\M+\\. We have T^jH' e GJ^^ by 
virtue of Lemma |4.3[ because Tm is a square orthogonal matrix; consequently H^^m) — ^r(M) ^ ^ 

for H'^i^j^j-^ e ^J^^CT^''^'^ and some matrix B. Therefore we can apply Theorem 13.11 for A ~ Fl'^^j^j-^ 
and obtain the bound of Theorem 14.11 on ^i/||(7\/_ff)+||(y)- One can similarly deduce the bound on 
Fi/\\{GM)+\\{y) or can just apply the above bound on i^i/||(7\/_ff)+||(2/)) for H = G^ and M replaced 
by M^ and then recall that [M^G'^Y = GM. □ 

By combining p.3p with Theorems 13.21 (for B = O) and 14.11 we can probabilistically bound 
the condition numbers of randomized products GM and MH . The following corollary extends the 
bound of Theorem 14.11 for a randomized matrix product to the bounds for its blocks. 
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Corollary 4.1. Suppose j, k, m, n, q and s are integers, I < j < q, 1 < k < s, M R™^", 
cr > 0, G e Gf/";", H e G'l^l', rank(Afj) = j for M, ^ M ( ), rank(Af('=)) = k for M^^) = 

(Ik I Ok.m-k)M , and y > 0. Then (i) with probability 1 the matrix GM (resp. MH) has full rank 
i/rank(M) > q (resp. i/rank(Af) > s). Furthermore (ii) ^i/||((gm)<^')+|| (j') - 2-35yv(7p^/ll/o' */ 
rank(M) > |((Mff) 1 1 (j^) ^ 2.35yV^||(AfW)+||/a i/rank(Af) > k. 

Proof. We immediately verify part (i) by applying the techniques of Section 13.21 To prove part (ii) 
apply Theorem 14.11 replacing G by {Ij \ Oj,q-j)G and replacing M by M ( J. For every k 

apply Theorem 14. II replacing M by {Ik \ Ok^rn-k)M and replacing H hy H [ ) . □ 



-k.k 



Corollary 14.11 can be immediately extended to any block of the matrices GM and MH ^ but 
we single out the leading blocks because applications of GENP and block Gaussian elimination 
are numerically safe where these blocks are nonsingular and well conditioned. We have empirical 
evidence that such applications are numerically safe even where we use circulant multipliers G and 
H filled with ±1 and where randomization is restricted to choosing the signs ± (see Tables mi and 
17. 3p . The paper [Tllj provides some formal support for using some other structured randomized 
multipliers. 

5 Approximate bases for singular spaces, low-rank approxi- 
mation, and the computation of numerical rank 

5.1 Randomized low-rank approximation: an outline and an extension to 
approximation by structured matrices 

Supppose we seek a rank-p approximation to a matrix A that has a numerical rank p. We can solve 
this problem by computing the SVD of the matrix A or its rank-revealing factorization |GE96) . 
|HP92] . |P00aj . but in this section we cover alternative numerically stable and noncostly solutions 
based on randomized matrix multiplication. As by-product we obtain approximate matrix bases for 
the left or right leading singular space fp^A and Sp^^i. 

Let us supply further details. Our next theorem expresses a rank-p approximation to a matrix 
A through a matrix basis for any of the two leading singular spaces ^p^A any §p,A- Theorem 15.21 
of Section 15.31 supports randomized computation of such an approximate basis for the space Tp 
from the product A^G for G e G^i''* ■ The paper |T11| formally supports this algorithm for 
a special class of random structured multipliers G, and our tests consistently show such support 
where G e 7^™^''^ (see Tables \T7I\ and [73| . We conjecture that the same is true for various other 
classes of sparse and structured multipliers G, defined by much fewer random parameters compared 
to the number of the entries. We specify a low-rank approximation algorithm in Section [5. 4| which 
has important applications to matrix computations, many listed in [HMTll] . For a natural extension 
assume a matrix W having a possibly unknown numerical displacement rank d, that is lying near 
some matrices with a displacement rank d (see the definitions in |KKM79| . [BMOl] . |P01| ). We can 
compute one of these displacements as a rank-d approximation to the displacement of the matrix 
W , and then immediately recover a structured matrix approximating the matrix W . 



5.2 Low-rank approximation via the basis of a leading singular space 

Next we prove that both orthogonal and nonorthogonal projections of a matrix A onto its leading 
singular spaces Tp.^ and Sp_A approximate the matrix within the error norm (jp^i{A). 

Theorem 5.1. Suppose A is an m x n matrix, S'aS^TJ is its SVD of \2.4^ , p is a positive integer, 
p < I = min{m, n}, and T and S are matrix bases for the spaces Tp,A and Sp^A, respectively. Then 

\\A^ AT{T^T)-^T^\\ = \\A- S{S^S)-^S^A\\=ap+i{A). (5.1) 
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For orthogonal matrices T and S we have T^T — S'^ S = Ip and 



P-^TT^II = \\A-SS^A\\=ap+^iA). (5.2) 
Proof. Write P = Tp^aTJa and r = n - p, observe that T'^Tp^A = (^q'' ^ , and obtain AP = 

Sa^aTITp^aTIj, = Sa^a (^^-^^ , whereas A = Sa^^a (^^|^) • Hence A- AP = Sa^a - 

Sa diag(Op.p, Sp)Tj, where Sp is the (m — p) x (n — p) diagonal matrix with the diagonal entries 
CTp+i, . . . , (T;. Thus \\A — AP\\ — ||Sp|| = <7p^i{A) because Sa and Ta are square orthogonal 
matrices. This proves the estimates (|5.1I) and (15.21) for T = Tp,A- Let us extend them to any matrix 
basis T for the space '^p,A, that is for T — Tp^U where C/ is a nonsingular matrix. Recall that 
Tj^A^p^A = Ip, and obtain successively {U^TjjTp^AU)-^ = U'^U-^ , U{U^Tj^^Tp^AU)-^U^ = Ip, 
and T{T^T)-^T'^ Tp^aU{U'^TJaTp,aU)~^U'^TJa = Tp^aTJa, implying the desired extension. 
Apply the proof to the transpose to extend it to matrix bases S for the space §p,A- CH 



5.3 A basis of a leading singular space via randomized products 

The following theorem supports randomized approximation of matrix bases for the leading singular 
spaces fp^A and Sp.^ of a matrix A. 

Theorem 5.2. Suppose A e K"^", e g^^'', and G e G^^'' and write S = AH and T ^ A^G . 
Then (i) rank(T) = rank(5') = min{p, rank(yl)} with probability 1 and (ii) nrank(T) = nrank(5) — 
min{p, nrank(A)} with probability close to 1. (Hi) Furthermore with probability close to 1 we have 

S + = Sp^aU tmdT + = Tp^AV (5.3) 

for two matrices A and A' having norms of order CTp+i(A) and for two nonsingular matrices U and 
V having condition numbers of at most order \\A\\/(ap{A)y^). 

Proof. We prove the claims about the matrix S, then apply them to the transpose A^ to extend to 
the matrix T. The techniques of Section l3^ support part (i). By truncating the SVD of the matrix 
A to the level of its numerical rank p_ we obtain a well conditioned matrix having both rank and 
numerical rank p- . Then we deduce part (ii) from Theorem 14.11 

Proving part (iii) we assume w.l.o.g. that rank(yl) > p. (Otherwise infinitesimal perturbation 
of the matrix A could yield this bound.) Write SVD A = Sa^aTa, 'Sp,A = diag(crj(^))^^j, 
U = T^p^aTJaH, and Ap = Sp^AU = Sa diag(i;p,A, 0„_p^„_p) Tj = Sp^a'^p^aTJa- Note that 
||Sa - diag(i]p,A,0,„-p,n-p)|| = crp+i(^)- Consequently \\A - Ap\\ = ap+i{A), AH ^ ApH + A, 
where ||A|| < ap+i{A) \\H\\, and the norm \\H\\ is likely to be bounded from above and below by 
two positive constants (see Theorem 13. 2p . This implies (|5.3I) . 

With probability 1 the p x p matrices U = Yjp AB and B = Tp aH are nonsingular (see 
Section [3.2p . Next we assume that they are nonsingular and estimate k{U). Clearly ||J7|| < 
\\^p,a\\ \\TJJ\ \\H\\ where ||Sp,A|| = \\A\\, \\TjJ\ - 1. Therefore \\U\\ < \\A\\ \\H\l which is 
likely to have order ||A||. 

Furthermore we have ||C/"^|| < ll^^^ll = WB^^W/f^pi^)- Apply Theorem lO where 

M — TJj^, r = p and ariM){M) = a = 1 and obtain that the norm is likely to have at most 

order 1/y/p. Therefore with probability close to 1, the norm = ||B-i|l/CTp(A) has at most 

order l/{ap{A)y^p), and then k{U) = ||;7|| \\U^^\\ has at most order \\A\\/{ap{A)y^p). □ 



5.4 A prototype algorithm for low-rank approximation 

Together Theorems 15.11 and 15.21 imply correctness of the following prototype algorithm where we 
assume that the input matrix has an unknown numerical rank and we know its upper bound. The 
algorithm employs approximation of a leading singular space of the input matrix. 

Proto- Algorithm 5.1. Rank-p approximation of a matrix (cf. \HMT11\ Section 10.3]). 
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Input: A matrix A G having an unknown numerical rank p, an integer pj^ > p, and two 

tolerances t and t' of order Up+i (A)/| jAj | . (We can choose r at Stage 2 based on rank revealing 
factorization of an auxiliary n x p^ matrix. The computation of this factorization is noncostly 
where p is small. We can choose t' at Stage 3 based on the required output accuracy, and can 
adjust both tolerances if the algorithm fails to produce a satisfactory output.) 

Output: FAILURE (with a low probability) or an integer p and two matrices T S R"^'' and 
Ap e M™^", both having ranks at most p and such that \\Ap — ^|| < ''"'1 1^1 1 O'l^'d T satisfies 
bound lEIM) of Theorem\Mfor ||A'|| < tP||. 

Computations: 

1. Compute the n x p^ matrix T' = A^G for G G Gq'^i'^^ • 

2. Compute a rank revealing factorization of the matrix T' and choose the minimal integer 
s and an n X s matrix T such that \\T' — [T \ On,p^-s)\\ < ''"H^H- 

3. Compute the matrix As = AT{T^T)^^T^ . Output p — s, T and Ap and stop if \\Ap — 
^11 < Otherwise output FAILURE and stop. 

Assume a proper choice of both tolerances r and r'. Then by virtue of Theorem 15.21 we can 
expect that at Stage 2 we obtain s = p and an approximate matrix basis T for the singular space Tp_^ 
(within an error norm of at most order Cp+i (A)). If so, Stage 3 outputs FAILURE with a probability 
near 0, by virtue of Theorems 15. li and in the case of FAILURE we can reapply the algorithm for 
new values of random parameters or for the adjusted tolerance values r and r'. At Stage 2 we have 
s < p because nrank(^"^G) < nrank(A) — p, whereas for a sufhciently small tolerance r' the bound 
\\Ap ^ ^11 < ''"'11(^)11 a-t Stage 3 implies that s > nrank(74). These observations enable us to certify 
correctness of the outputs p, T, and Ap of the algorithm. 

We can similarly approximate the matrix A by a rank-p matrix S{S'^ S)^^ S'^ A, by first computing 
the matrix S' = AH for H G Gq^^^ , then computing its rank revealing factorization, which is 
expected to define an approximate matrix basis S for the space §p,A, and finally applying Theorem 
15. li to approximate the matrix A by a rank-p matrix. 

Remark 5.1. For p^ — p we can write T — T' — A^ G and skip Stage 2 because the matrix A^ G 
is expected to serve as a desired approximate matrix basis by virtue of Theorem \5.2l In Appendix B 
we compute numerical rank by using randomized matrix multiplications instead of standard recipes 
that use orthogonalization or pivoting. 

Remark 5.2. The increase of the dimension p+ beyond the numerical rank p (called oversampling 
in \HMTllf ) is relatively inexpensive if the bound is small. \HMT1 Ij suggests using small over- 
sampling even if the numerical rank p is known, because we have 

Probability {\\A - ATT'^\\ < (1 + p+ min{m, n})ap+i{A)} > 1 - 3{p+ - pY^f+ for p+ > p. 

Theorem ] 5. 2[ however, bounds the norm \\A — ATT'^W strongly also for p = p^, in good accordance 
with the data of Tables \T^and\T^ 

Remark 5.3. For a larger integer p we can substantially simplify Stage 1 of the algorithm by 
choosing structured multipliers G from the class of the subsample random Fourier transforms, called 
SRFTs. Under this choice the estimated probability of obtaining low rank approximation is close to 
the above case of Gaussian random multipliers G ^Tllf . Our tests in Section provide informal 
empirical support for similar use of random Toeplitz multipliers G. 

Remark 5.4. By applying rank revealing QR factorization at Stage 2 of the aljjorithm we can 
produce an orthogonal matrix T and consequently simplify Stage 3 by computing Ag = ATT^ (cf. 
hd.'^) ). We adopted such a variation of the algorithm in our tests in Section^ 
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Remark 5.5. One can weaken reliability of the output to simplify Stage 3 by testing whether 
WK"^ {A — Ap)L\\ < t\\K\\ \\A\\ \\L\\ for matrices K G G^i'' and L £ Gq\'' and for two small 
positive integers p' and p" , possibly for p' = p" = 1, instead of testing whether WAp — AW < t'||(j4)||. 
One can similarly simplify Stage 2. 

Remark 5.6. Application of Proto- Algorithm ] 5. 1\ to the approximation of the leading singular spaces 
Tp,A and Bp, a is facilitated and its power is enhanced as the gaps increase between the singular values 
of the input matrix A. This motivates using the power transforms A Bh = {AA'^y''A for positive 
integers h because aj{Bh) = {aj{A))'^^'^^ for all j . 

Remark 5.7. Clearly every matrix basis of the trailing singular space „_p of an m x n matrix 
A is orthogonal to every matrix basis of the leading singular space Tp A, and similarly for any pair 
of matrix bases of the spaces SA,m~p ond Sp.a- One can exploit this duality for the computation and 
approximation of the bases. 

6 Application to Tensor Train decomposition 

Let 

A= (6.1) 

denote a c?-dimensional tensor with entries A{ii, . . . ,id) and spacial indices ii, . . . ,id ranging from 
1 to rii, . . . , Hd, respectively. Define the d — 1 unfolding matrices Ak = [A{ii ■ . .ik', ik+i ■ ■ ■ id)], k = 
I, . . . ,d, where the semicolon separates the multi-indices ii . . .ik and ik+i ■ . - id, which define the 
rows and columns of the matrix Ak, respectively, k = l,...,d. The paper |O09j proposed the 
following class of Tensor Train Decompositions, hereafter referred to as TT Decompositions, where 
the summation indices ai, . . . , a^-i ranged from 1 to compression ranks ri, . . . , rd-i, respectively, 

T= ^ Gi{ii,ai)G2{ai,ii,a2) ■ ■ ■ Gd-i{ad-2,id-i,a-d-i)Gd{ad,id). (6.2) 

cn,...,ad-l 

Theorem 6.1. \O09^ . For any tensor A of \6.1]) there exists a TT decomposition h6.2\) such that 
A = T and rk = rank(^fc) for k — 1, . . . ,d ~ 1. 

There is a large and growing number of important applications of TT decompositions (j6.2p to 
modern computations (cf. e.g., |OT09| . |OT10) . |0T11) ) where the numerical ranks of the unfolding 
matrices Ak are much smaller than their ranks, and it is desired to compress TT decompositions 
respectively. 

Theorem 6.2. JOT 10^ . For any tensor A of 116.1]} and any set of positive integers rk < rank(y4fe), 
k = 1, . . . , d ^ 1, there exists a TT decomposition i6.2\) such that 

d-l 

||A - Till < Vr|, Tfc = min \\Ak - B\\f, k = 1, . . . ,d - 1. (6.3) 

rank(B)=rfc 

The constructive proof of this theorem in |OT10] relies on inductive approximation of unfolding 
matrices by their SVDs truncated to the compression ranks r^. Let us sketch this construction. 
For d = 2 we obtain a desired TT decomposition r(?i,?2) = X]q\ '^i)'^2(ai, 12) (that is 

a sum of ri outer products of ri pairs of vectors) simply by truncating the SVD of the matrix 
A{ii,i2). At the inductive step one truncates the SVD of the first unfolding matrix Ai = Sai'^AiT'^^ 
to obtain rank-ri approximation of this matrix Bi = Sbi^BiT^.^ where = diag((Tj (^i))^;Lj^ 
and the matrices Sbi and Tb^ are formed by the first ri columns of the matrices Sai and Ta^ , 
respectively. Then it remains to approximate the tensor B = [B{ii, . . . ,id)] represented by the 
matrix Bi. Rewrite it as '^^_^^iSBi{ii',cti)A{ai;i2 . ■ .id) for A ~ Ssi^Si' represent A as the 
tensor A — [A(aii2,i3, . . . ,id)] of dimension 0? — 1, apply the inductive hypothesis to obtain a 
TT-approximation of this tensor, and extend it to a TT-approximation of the original tensor A. 
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In [OTlOj the authors specify this construction as their Algorithm 1, prove error norm bound 
(|6.3p . then point out that the "computation of the truncated SVD for large scale and possibly dense 
unfolding matrices ... is unaffordable in many dimensions", propose "to replace SVD by some other 
dyadic decompositions Ak ~ UV'^ , which can be computed with low complexity", and finally specify 
such recipe as (OTIOI Algorithm 2], which is an iterative algorithm for skeleton or pseudoskeleton 
decomposition of matrices and which they use at Stages 5 and 6 of their Algorithm 1. The cost of 
each iteration of |OT10( Algorithm 2] is quite low, and empirically the iteration converges fast, but 
the authors welcome alternative recipes having formal support. 

Proto- Algorithm IS . 1 1 can serve as an alternative to [OTIO,, Algorithm 2]. For the input matrix Ai 
above we use 0{ri) multiplications of this matrix by 0{ri) vectors, which means a low computational 
cost for sparse and structured inputs, whereas the expected output is an approximate matrix basis 
for the space Sri,Ai or Tr-^,Ai and a rank-ri approximation to the matrix Ai, within an expected 
error norm in 0{(Jr-^+i{Ai)). This is the same order as in jOTlOi Algorithm 1], but now we do not 
use SVDs. One can further decrease the error bound by means of small oversampling of Remark l5.2l 
and the power transform of Remark 15.61 

Remark 6.1. A huge bibliography on tensor decompositions and on thier application to fundamental 
matrix computations has been recently surveyed in \KB09^ . hut with the omission of the early works 
JPT^l, JPTUI, 'B801, 'P84^, \B86f . where nontrivial tensor decompositions helped to accelerate the 
fundamental operation of matrix multiplication, probably the first application of this kind. 

7 Numerical Experiments 

Our numerical experiments with random general, Hankel, Toeplitz and circulant matrices have been 
performed in the Graduate Center of the City University of New York on a Dell server with a dual 
core 1.86 GHz Xeon processor and 2G memory running Windows Server 2003 R2. The test Fortran 
code has been compiled with the GNU gfortran compiler within the Cygwin environment. Random 
numbers have been generated with the random_number intrinsic Fortran function, assuming the 
uniform probability distribution over the range {x : —1 < x < 1}. The tests have been designed by 
the first author and performed by his coauthor. 

7.1 GENP with random circulant multipliers 

Table FTT] shows the results of our tests of the solution of a nonsingular well conditioned linear system 
Ay = b of n equations whose coefficient matrix has ill conditioned n/2 x n/2 leading principal block 
for n = 64, 256, 1024. We have performed 100 numerical tests for each dimension n and computed 
the maximum, minimum and average relative residual norms \\Ay — b||/||b|| as well as standard 
deviation. GENP applied to these systems outputs corrupted solutions with residual norms ranging 
from 10 to 10^. When we preprocessed the systems with circulant multipliers filled with ±1 (choosing 
the n signs ± at random), the norms decreased to at worst 10^^ for all inputs. Table [73] also shows 
further decrease of the norm in a single step of iterative refinement. Table 2 in |PQZa| shows similar 
results of the tests where the input matrices have been chosen similarly but so that their every 
leading k x k block had numerical rank A: or fc — 1 and where Householder multipliers In — uv^ /u^v 
replaced the circulant multipliers. Here u and v denote two vectors filled with integers 1 and —1 
under random choice of the signs + and — . 

7.2 Approximation of the tails and heads of SVDs and low-rank appro- 
ximation of a matrix 

At some specified stages of our tests of this subsection we performed additions, subtractions and 
multiplications with infinite precision (hereafter referred to as error- free ring operations). At the 
other stages we performed computations with double precision, and we rounded to double precision 
all random values. We performed at most two refinement iterations for the computed solution of 
every linear system of equations and matrix inverse. 
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Tables 17.21 and 17.31 display the data from our tests on the approximation of leading singular 
spaces of the SVD of an n x n matrix A having numerical rank q and on the approximation of this 
matrix with a matrix of rank p. For n = 64, 128, 256 and p = 1,8, 32 we generated n x n random 
orthogonal matrices S and T and diagonal matrices S = diag(crj such that aj = 1/ j, j ^ 1, . . . , p, 
(jj ~ 10^^°, j = /9 + 1, . . . , n (cf. |H02[ Section 28.3]). Then we applied error-free ring operations 
to compute the input matrices A = S'^iE^TJ, for which ||A|| = 1 and k,{A) — 10^". Furthermore 
we generated random nx p matrices G (for p — 1,8,32) and successively computed the matrices 
Bp^A = A^G, Tp^A, Bp^AYp^A as a least-squares approximation to Tp^A, Qp.A = Q{Bp.A) (cf. Fact 
12. 2p . and A — AQ p^a{Q p.a)^ (by applying error-free ring operations). Table summarizes the data 
on the residual norms rn'^^ — \\Bp^AYp^A — Tp^A\\ and rn'-^^ = — AQp^AiQp,A)'^\\ obtained in 100 
runs of our tests for every pair of n and p. 

We have also performed similar tests where we generated random Toeplitz n x p matrices T 
(for p = 8, 32) and then replaced the above approximate matrix bases i?p,A = A'^G for the leading 
singular space Tp^A by the matrices -Bp, a = A^T. Table 17.31 displays the results of these tests. In 
both Tables 17.21 and 17.31 the residual norms are more or less equally small. 



Table 7.1: Relative residual norms: randomized circulant GENP for well conditioned linear systems 
with ill conditioned leading blocks (cf. |PQZa[ Table 2]) 



dimension 


iterations 


min 


max 


mean 


std 


64 





4.7 X 10"^* 


8.0 X IQ-ii 


4.0 X IQ-i^ 


1.1 X 10-11 


64 


1 


1.9 X 10-^^ 


5.3 X 10"^^ 


2.3 X 10-1" 


5.4 X 10-1'' 


256 





1.7 X 10-1^ 


1.4 X 10-^ 


2.0 X 10--^ 


1.5 X 10-« 


256 


1 


8.3 X 10-^^ 


4.3 X IQ-i" 


4.5 X IQ-i^ 


4.3 X 10-11 


1024 





1.7 X 10"^" 


4.4 X 10-'-' 


1.4 X 10--^ 


2.1 X 10-*^ 


1024 


1 


3.4 X 10-1* 


9.9 X IQ-'^^ 


6.8 X 10-1'' 


2.7 X 10-1" 



Table 7.2: Heads of SVDs and low-rank approximation by using random multipliers G 



q 


rrui 


n 


min 


max 


mean 


std 


1 




64 


2.35 X 10-1" 


1.32 X 10-"^ 


3.58 X 10-°^ 


1.37 X 10-"* 


1 


rn(i) 


128 


4.41 X 10-1" 


3.28 X 10-°* 


3.55 X 10-"^ 


5.71 X 10-"^ 


1 


rn^i-* 


256 


6.98 X 10-1" 


5.57 X 10-°* 


5.47 X 10-"^ 


8.63 X 10-"^ 


1 


i-n<.2) 


64 


8.28 X 10-1" 


1.32 X 10-°^ 


3.86 X 10-"^ 


1.36 X 10""* 


1 




128 


1.21 X 10-°^ 


3.28 X 10-°* 


3.91 X 10-"^ 


5.57 X 10-"9 


1 




256 


1.74 X 10-"^ 


5.58 X 10-°* 


5.96 X 10-°^ 


8.47 X 10-"9 


8 


rn^i-* 


128 


2.56 X 10-°^ 


1.16 X 10-°6 


4.30 X 10-"* 


1.45 X 10-"^ 


8 


rn*-!^ 


256 


4.45 X 10-°^ 


3.32 X 10-°^ 


3.40 X 10-"* 


5.11 X 10-"* 


8 




64 


1.46 X 10-°^ 


9.56 X 10-°* 


5.77 X 10-"^ 


1.06 X 10-"* 


8 


i-n(2) 


128 


1.64 X 10-"^ 


4.32 X 10-°^ 


1.86 X 10-"* 


5.97 X 10-"* 


8 




256 


2.50 X 10-"^ 


1.56 X 10-°^ 


1.59 X 10-"* 


2.47 X 10-"* 


32 




64 


6.80 X 10-°^ 


2.83 X 10-°*^ 


1.01 X 10-"^ 


3.73 X 10-"^ 


32 




128 


1.25 X 10-°^ 


6.77 X 10-°*^ 


1.28 X 10-"^ 


6.76 X 10-"^ 


32 




256 


1.85 X 10-"** 


1.12 X 10-°6 


1.02 X 10-"^ 


1.54 X 10-"^ 


32 




64 


1.84 X 10-"^ 


6.50 X 10-°^ 


2.30 X 10-"* 


8.28 X 10-"* 


32 




128 


3.11 X 10-°^ 


1.45 X 10-°6 


2.87 X 10-"* 


1.45 X 10-"^ 


32 




256 


4.39 X 10-"^ 


2.16 X 10-°^ 


2.37 X 10-"* 


3.34 X 10-"* 
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Table 7.3: Heads of SVDs and low-rank approximations by using random Toeplitz multipliers T 



q 


rrn''-* 


n 


min 


max 


mean 


std 


8 


rrn^^^ 


64 


2.22 X 10-°^ 


7.89 X 10-06 


1.43 X 10-07 


9.17 X 10-07 


8 


rrn^^^ 


128 


3.79 X 10-09 


4.39 X 10-05 


4.87 X 10-07 


4.39 X 10-06 


8 




256 


5.33 X 10-0'^ 


3.06 X 10-06 


6.65 X 10-08 


3.12 X 10-07 


8 


rrn^^^ 


64 


1.13 X 10-09 


3.66 X 10-06 


6.37 X 10-08 


4.11 X 10-07 


8 


j.j.j^(2) 


128 


1.81 X 10-09 


1.67 X 10-05 


1.90 X 10-07 


1.67 X 10-06 


8 


rrn^^^ 


256 


2.96 X 10-09 


1.25 X 10-06 


2.92 X 10-08 


1.28 X 10-07 


32 




64 


6.22 X 10-09 


5.00 X 10-07 


4.06 X 10-08 


6.04 X 10-08 


32 




128 


2.73 X 10-08 


4.88 X 10-06 


2.57 X 10-07 


8.16 X 10-07 


32 


rrn^"'^^ 


256 


1.78 X 10-08 


1.25 X 10-06 


1.18 X 10-07 


2.03 X 10-07 


32 


rrn^^^ 


64 


1.64 X 10-09 


1.26 X 10-07 


9.66 X 10-09 


1.48 X 10-08 


32 




128 


5.71 X 10-09 


9.90 X 10-07 


5.50 X 10-08 


1.68 X 10-07 


32 


j.j.j^(2) 


256 


4.02 X 10-09 


2.85 X 10-07 


2.74 X 10-08 


4.48 X 10-08 



8 Conclusions 

It is well known that random matrices tend to be well conditioned, and this property motivates our 
application of random matrix multipliers for advancing some fundamental matrix computations. We 
first prove the basic fact that with a probability close to 1 multiplication by a Gaussian random 
matrix does not increase the condition number of a matrix and of its any block dramatically com- 
pared to the condition number of the input matrix. As an immediate implication random multipliers 
are likely to stabilize numerically GENP (that is Gaussian elimination with no pivoting) and block 
Gaussian elimination applied to a nonsingular and well conditioned matrix, possibly having ill condi- 
tioned and singular leading blocks, by applying to input matrix randomized structured multipliers. 
Another basic fact states that with a probability close to 1 the column sets of the products A^G and 
AH where an m x n matrix A has a numerical rank p and G and H are Gaussian random matrices 
of sizes mx p and n x p, respectively, approximate some bases for the left and right leading singular 
spaces Sp,A and Tp^^ associated with the p largest singular values of the matrix A. Having any of 
such approximate bases available we can readily approximate the matrix A by a matrix of rank p, 
This has further well known extensions to many important matrix computations, and we point out 
new extensions to the approximation of a matrix by a structured matrix lying nearby, to computing 
numercial rank of a matrix, and to Tensor Train approximation. 

Our tests consistently showed efficiency of the proposed techniques even where instead of general 
Gaussian random multipliers we applied structured and sparse multipliers. In these cases random- 
ization was limited to much fewer random parameters or just to the choice of the signs ± of a few 
auxiliary vectors. The recent paper [Tllj is an important step toward understanding and exploiting 
this phenomenon and should motivate further research effort. Another natural research subject is 
the combination of randomized matrix multiplication with randomized techniques of additive pre- 
processing and augmentation, recently studied in [PGMQ] , (PIMRlOj . |PQ10| , |PQ12| , |PQZC| , 
|PQZa| , |PQZb| , and [PQZ^ . 

Appendix 

A Uniform random sampling and nonsingularity of random 
matrices 

Uniform random sampling of elements from a finite set A is their selection from this set at random, 
independently of each other and under the uniform probability distribution on the set A. 
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Theorem A.l. Under the assumptions of Lem,m,a \3.1\ let the values of the variables of the polynomial 
he randomly and uniformly sampled from a finite set A. Then the polynomial vanishes with a 
probability at most 

Corollary A.l. Let the entries of a general or Toeplitz ni x n matrix have been randomly and 
uniformly sampled from a finite set A of cardinality |A| (in any fixed ring). Let I = min{TO,n}. 
Then (a) every k x k submatrix M for k < I is nonsingular with a probability at least 1 — and 

(b) is strongly nonsingular with a probability at least 1 — X]^=i JS] ~ ^ ~ '^^[SJ^' 

Proof. The claimed properties of nonsingularity and nonvanishing hold for generic matrices. The 
singularity of a fc x fc matrix means that its determinant vanishes, but the determinant is a polynomial 
of total degree k in the entries. Therefore Theorem I A. II implies parts (a) and consequently (b). Part 

(c) follows because a fixed entry of the inverse vanishes if and only if the respective entry of the 
adjoint vanishes, but up to the sign the latter entry is the determinant of a (fc— 1) x (fc— 1) submatrix 
of the input matrix M, and so it is a polynomial of degree /c — 1 in its entries. □ 



B Computation of numerical ranks 

The customary algorithms for the numerical rank of a matrix rely on computing its SVD or rank 
revealing factorization and involving pivoting or orthogonalization. Proto- Algorithm 15.11 also uses 
rank revealing factorization at Stage 2 and matrix inversion or orthogonalization at Stage 3, but 
only with matrices of small sizes provided the integer is small. Our next alternative algorithm 
avoids pivoting and orthogonalization even where the numerical rank p is large. As by-product we 
compute an approximate matrix basis within an error norm in 0(CTp+i(^)) for the leading singular 
space ^ of an TO X n matrix A and can extend this readily to computing a rank-p approximation 
of the matrix A (see Remark iB.ll) . We let m > n (else shift to A^), let [p-,p+] = [0,n] unless we 
know a more narrow range, and successively test the selected candidate integers in the range p+] 
until we find the numerical rank p. To improve reliability, we can repeat the tests for distinct values 
of random parameters (see Remarks IB. ip . 

Exhaustive search defines and verifies the numerical rank p with probability near 1, but with 
proper policies one can use fewer and simpler tests because for G £ Gq^i^ (and empirically for 
various random sparse and structured matrices G as well) the matrix B = A^G is expected (a) to 
have full rank and to be well conditioned if and only \i s > p, (b) to approximate a matrix basis 
(within an error norm in 0((Tp+i (A))) for a linear space T D '^p,B ="^pA where s > p, and (c) to 
approximate a matrix basis (within an error norm in 0(crp+i(A))) for the space ^ where s = p. 
Property (a) is implied by Theorem 14. 11 properties (b) and (c) by Theorem 15.21 

Proto- Algorithm B.l. Numerical rank virithout pivoting and orthogonalization (see Re- 

Input: Two integers p_ and p^ and a matrix A G r™x" having unknown numerical rank p = 
rank(y4.) in the range [p^,p^] such that < p^ < p^ < n < m, a rule for the selection of a 
candidate integer p in a range and a Subroutine COND that determines whether a 

given matrix has full rank and is well conditioned or not. 

Output: an integer p expected to equal numerical rank of the matrix A and a matrix B expected to 
approximate (within an error norm in 0{apj^i{A))) a matrix basis of the singular space T^^- 
(Both expectations can actually fail, but with a low probability, see Remark \B.l[ ) 

Initialization: Generate matrix G S Gq^i^^ and write B = A, Gp ~ G{Ip \ Op.m-pY for p = 
p^, p^ + 1, . . . , (The m X p matrix Gp is formed by the first p columns of the matrix G.) 

Computations.- 

1. Stop and output p = p^ and the matrix B if P- = p+. Otherwise fix an integer p in the 
range [p-,p+\. 
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2. Compute the matrix B' — B^Gp and apply to it the Subroutine COND. 

3. If this matrix has full rank and is well conditioned, write = p and B = B' and go to 
Stage 1. Otherwise write p- = p and go to Stage 1. 

Remark B.l. The algorithm can output a wrong value of the numerical rank, although by virtue of 
Theorems \5.1\ and \5.2\ combined this occurs with a low probability. One can decrease this probability by 
reapplying the algorithm to the same inputs and choosing distinct random parameters. Furthermore 
one can fix a tolerance r of order (7p+i(A), set T = B, and apply Stage 3 of Proto- Algorithm \5.1\ 
Then nrank(A) is expected to exceed the computed value p if this stage outputs FAILURE and to 
equal p otherwise, in which case Proto- Algorithm \5.1\ also outputs a rank-p approximation of the 
matrix A (within an error norm T||yl|| in 0(CTp_|_i(A))J. For a sufficiently small tolerance r the 
latter outcome implies that certainly p > iirank(yl). 

Remark B.2. We can avoid pivoting and orthogonalization in Subroutine COND by applying the 
Power or Lanczos algorithms JBT^, llD83f . \KW92h We can first apply the Power Method to the 
matrix S = A'^ A or S = AA^ to yield a close upper bound cr^ on its largest eigenvalue cr1{A) 
and then to the matrix a^I — A^A to approximate the smallest eigenvalue of the matrix S , equal 
to a^(A). The Lanczos algorithm approximates both extremal eigenvalues of the matrix S — A'^A 
(equal to uf{A) and af{A), I = inin{m,n} respectively) and converges much faster \GL9(^ Sections 
9.1.4, 9.1.5]. 

Remark B.3. One can simplify Stage 2 by applying the Subroutine COND to the matrix G'p = FpGp 
of a smaller size (rather than to Gp) where Fp S ^/g i™- By virtue of Theorem \4-l\ the matrices Gp 
and Gp are likely to have condition numbers of the same order. 

Remark B.4. The binary search p — \{p^+ p^) /2\ is an attractive policy for choosing the candidate 
values p, but one may prefer to move toward p_ , the left end of the range more rapidly, to decrease 
the size of the matrix B' . 
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